DESIGNED TO HARM

May 28

Grok, sexual abuse imagery, and the architecture of technopatriarchy

By Asma Derja and Sara Albakri | Edited by Harshini Rajachander | 18 min read

March 8th, 2026

*Francisco Goya, The Sleep of Reason Produces Monsters, 1799, from Los Caprichos. Public domain*

Between July 2025 and January 2026, Grok generated hundreds of thousands of sexualized images per day. Thousands depicted what appeared to be minors. Three million images in under two weeks. Eighty-one percent of them: women. The dominant public response asked why the filters failed. This piece argues that is the wrong question, and that asking it reveals something important about what governance is willing to see. What follows is two accounts of the same event: one structural, one historical. Together, they make the case that what happened with Grok was neither accident nor aberration. It was design operating exactly as precedent allowed. Because internal development processes are not publicly disclosed, this analysis focuses on observable platform features, documented incidents, and their structural implications.

PART 1 Grok Didn't Just Fail; It Was Designed This Way

By Asma Derja, Founder, Ethical AI Alliance

Our understanding of how Grok came to operate as it did is constrained by a fundamental asymmetry: we can observe outputs, platform integration, and user behavior. We cannot observe the internal design decisions, risk assessments, the datasets used for training, or trade-off deliberations that made those outcomes possible. Training data is not a neutral input and what a model is trained on determines the output. When that data composition is opaque meaning neither regulators nor the public can audit what images, text, or behavioral patterns shaped the model's outputs, the upstream accountability gap begins not at deployment, but at the dataset level. Here’s the thing: current model treats AI safety as a post-deployment problem measurable through outputs, addressable through moderation. But this is not a new strategy.

It is the same playbook social media companies ran with for two decades: deploy first, moderate later, and treat the resulting harm as a content problem rather than a design problem. We know how that ended.

But the choices in generative AI happen even further upstream than a feed algorithm, in spaces where external actors have no sight lines at all. This part examines what we can observe about Grok's design and what remains internal. It argues that the gap between these two domains, the visible and the opaque, is where governance currently fails. We talk about AI systems "allowing" harmful outputs, as if the system is neutral and users are the active agents. But that frame inverts how generative systems actually work. As Langdon Winner argued in his foundational work on technological politics, artefacts are not neutral; they embody forms of authority and political arrangement. Design determines: What requires effort vs. what is frictionless? What is visible vs. what is hidden? What is treated as legitimate use and what’s not? What is scalable and what's isolated?

These are exercises of power made upstream, before a single user opens the interface. Grok's design didn't "fail to prevent" sexual abuse imagery. It constructed a system where generating that imagery was easier, faster, and more socially legible than in any previous consumer-facing AI product.

The Affordance Stack: What Grok Made Frictionless :

Think of design as a stack of affordances: choices that compound and where each layer amplifies the next one. What follows is an account of what we can observe externally about Grok's design, and what remains hidden in internal decision-making processes.

Layer 1: "0" Cost Generation

What we know externally: before Grok, creating sexualized images required graphic design skills, technical setup (downloading deepfake software, configuring backends), or accessing niche communities. Each created friction, not a moral one, but practical friction i.e. effort, time, technical knowledge, social discovery costs. Grok collapsed all of that to: type a prompt, wait 10 seconds. The cost -in effort, in time, in expertise -dropped to effectively zero…

What we don't know internally: we don't know what risk assessments, if any, were conducted on the implications of zero-cost generation at scale. We don't know whether safety teams flagged this as a predictable harm vector. We don't know what trade-offs between accessibility and abuse prevention were deliberated, who made the final call, or what alternatives were considered. When cost drops to zero, volume explodes. It's basic economics (applied to harm production in this context). And when that economic shift happens without visible upstream constraint, the question is: was this outcome anticipated, accepted, or simply not considered?

Layer 2: Permissiveness as Feature

What we know externally: according to platform documentation and third-party reporting, Grok launched with a setting labeled "Spicy Mode." This feature was described in platform documentation and by journalists, including Wired, as allowing less filtered, more explicit outputs. In independent testing, Spicy Mode enabled the generation of nudity and sexually explicit deepfakes that other systems explicitly refuse. This was an actual product feature, labeled and exposed in the interface. The significance of labeling this mode "spicy" cannot be overstated. In design terms, it transformed what might have been an internal constraint toggle into what Madeleine Akrich calls a "script". A script is an inscription of expected behavior into the interface itself. Permissiveness was exposed as a positive attribute. The system started allowing boundary-pushing prompts. Worse, it anticipated them, legitimized them, and reduced the cognitive friction that might otherwise make users hesitate. Helen Nissenbaum's concept of contextual integrity clarifies what happens here. When a system reframes boundary-crossing behavior as normal use, it violates existing social expectations about context, consent, and appropriateness, even if no explicit rule is broken. The interface is therefore performing normalization work.

What we don't know internally: we don't know the internal reasoning for naming a mode "spicy" rather than something more conventional. Did this originate from product marketing logic, engineering culture, or a strategic positioning choice? We don't know whether legal or ethics teams reviewed this framing. We don't know what discussions, if any, occurred about how this label would be interpreted by users or whether it would signal legitimacy for harmful requests.

What we can observe is the outcome: a design choice that collapsed normative barriers by reframing them as product differentiation. And it fits a legible pattern. Since Musk's acquisition of X in 2022, the platform has systematically dismantled safety conventions, loosened content policies, and repositioned provocation as a feature. Musk himself posted lewd content on the platform before Grok's image generation launched. "Spicy Mode" did not emerge from nowhere.

Layer 3: Platform x Amplification x Infrastructure

What we know externally: Grok didn't launch as a standalone app. It launched integrated into X, a platform with 650 million monthly users and a core affordance designed for viral spread: the retweet button. This integration collapsed distribution cost. You don't need to find an audience, set up a channel, or navigate platform upload restrictions. The image generates inside the distribution infrastructure. One click to create, one click to amplify. Visibility became a default rather than a choice. Generated images appeared in feeds, in replies, in quote-tweets. The architecture assumed shareability. Privacy required opting out. And the model powering all of this was built on pre-trained datasets that were only partially disclosed. Reporting on the generative AI industry indicates that many models are trained on large-scale internet datasets that include adult and sexually explicit material, scraping vast portions of the internet, including darknet websites and platforms like OnlyFans. When a model learns from sexualized imagery at scale, its fluency in generating that imagery is a direct output of what it was trained on.

What we don't know internally: We don't know what risk assessments were conducted before integrating generative image capabilities into a platform built for viral distribution. We don't know whether product teams modeled what would happen when zero-cost generation met one-click amplification. We don't know what governance checkpoints, if any, existed to evaluate this combination before launch. What we can observe: a system where the affordance stack now looked like this: zero-cost generation (no technical barrier) + legitimized permissiveness (no normative barrier) + built-in amplification (no distribution barrier). Each layer compounds. Together producing a system in which abusive outputs could be generated and distributed at scale.

So Who is Responsible?

In traditional content moderation, responsibility is relatively clear. A user uploads illegal content, the platform takes it down, and if the platform fails to act, they face liability. But generative systems fragment responsibility in ways that existing frameworks struggle to map:

The model developer (xAI) says: "We built a general purpose tool. Users control the inputs."
The platform (X) says: "We host the infrastructure. We're not responsible for what the AI generates."
The user says: "The system let me do it. I didn't hack anything. I used it as designed."

None of these actors is lying…The system genuinely is designed such that responsibility is architecturally ambiguous. And that ambiguity is not accidental, it's a design outcome. By separating generation (xAI), distribution (X), and use (individual accounts), the system ensures that when harm occurs at scale, no single entity can be held accountable for the design choices that made scale possible.

What we know externally: When public outcry intensified, xAI restricted image generation to paying subscribers rather than removing the feature. The response was to monetize access to it. And that restriction proved porous. Reddit communities actively shared workarounds and prompt hacks to replicate Spicy Mode's original behavior. No monitoring system existed to track or close these gaps. The harm infrastructure remained intact; access just required slightly more effort.

What we don't know internally: we don't know whether there were internal escalation channels that flagged the scale of harm before the public response. We don't know how economic incentives (subscription growth, engagement metrics, competitive positioning) influenced the decision to restrict rather than remove. We don't know what legal, ethical, or safety arguments were made in those deliberations, or who had final authority. This is why post-hoc moderation fails so predictably. You can take down individual images. You can ban accounts. But you cannot retroactively un-design the affordance stack. You cannot remove the structural choices that made 3 million images in two weeks mechanically achievable. The harm is in the infrastructure that treats mass production of sexual abuse imagery as a normal operating condition.

What Happens When Design Crosses a Systems Threshold

There's a specific moment when design structurally embeds the harm and that threshold is crossed when:

The cost of harmful behavior drops to near-zero
The system frames that behavior as legitimate use
Distribution infrastructure makes repetition and scale automatic
Internal decision processes remain opaque to external accountability

Grok crossed that threshold because the combination of features created a system where abuse wasn't an edge case and was operating at equilibrium. Researchers found that even after images were removed, they remained accessible via direct URLs and that regeneration was instant, links persisted. Michel Foucault's work on normalization offers the analytical framework for understanding what happened. Power, in his account, does not primarily operate through prohibition. It operates through classification, repetition, and the production of what comes to feel ordinary. Systems do not need to compel behavior to shape it. They need only to make certain actions easier to perform, easier to justify, and easier to repeat. That is precisely what Grok's design accomplished. The Reddit communities sharing Spicy Mode workarounds are themselves a Foucauldian moment. Users were not coerced. They were classifying, sharing, repeating, normalizing. The system shaped the behavior without issuing a single instruction.

Why This Matters Beyond Grok

Grok is not an outlier. And won't be the last case. The design choices that enabled it are increasingly standard in the generative AI product landscape:

Zero-cost generation is the baseline expectation for AI tools
Permissiveness as differentiation is a common competitive strategy
Platform integration is how AI features reach scale
Distributed responsibility is the default legal architecture

The lesson is not "Grok is uniquely bad." The lesson here is: this is what AI systems look like when design proceeds without enforceable upstream accountability for harms that only become visible at scale.

You can't moderate your way out of a design problem. You can't ‘terms-of-service’ your way out of an affordance stack. The only intervention point is before the system ships. At the design layer. That's where governance needs to operate.

PART 2

Bodies as Baseline: Why Grok's Harm Was Historically Inevitable

By Sara Albakri, Student Ambassador, Ethical AI Alliance

Part 1 has shown how design choices compound and how zero-cost generation, legitimized permissiveness, and built-in amplification combined to cross a systems threshold where harm became a normal operating condition. But to understand why these choices were made, and why they were tolerated, we need to look beyond architecture. The design decisions did not emerge from a vacuum. They emerged from a context in which women's bodies have long functioned as the permissible testing ground for new technologies. What looks like a governance failure is also a historical continuity.

‘If we regard technology as neutral but subject to possible misuse, we will be blinded to the consequence of artefacts being designed and developed in particular ways that embody gendered power relations’

Judy Wajcman- Technofeminism

How could this happen?

Ever felt the discomfort of feeling a man undress you with his eyes? Well look no further. Grok’s generative AI image generator just upgraded perverted creativity to HD quality! Its system can produce the equivalent of 6,000 sexualized images of women every hour. In under two weeks, 3 million sexualized images were generated, 81% depicting women.

Marketed as an innovative and creative tool, Grok’s generative image function has attracted heavy scrutiny for the harms inflicted on women’s autonomy and safety online. While distracted by Grok’s rising production and dissemination of sexual abuse material, recognising the patriarchal scaffolding which enabled mass objectification happen to women first is crucial to realising the predictability of it all.

The objectification of women is not a novelty, nor an individual’s misuse issue, but a deeply cultural and capitalist phenomenon sustained through social norms, representation, and power. Women are consistently framed as less autonomous bodies with greater value placed in eliciting visual desire. Embedded patriarchal practises allowed for regulatory protections against the sexual abuse of women to be absent, to seem an inevitable trade off to Grok’s launch. Adopting a technofeminist lens lends an understanding of how the intentional absence of guardrails against gendered harm, perpetuates historical patterns of sexual exploitation and harassment to discipline women’s behaviour.

In this vein, Grok is presented as a product of patriarchal conditioning and simply, the newest tool to police women’s behaviour.

‘Innovative’ technology’s long track record of failing to operate as a gender-neutral domain makes women’s position as frontline victims of harm not incidental, but entirely predictable. Gendered discrimination has historically been within the bounds of permissive testing ground for technological innovation. Whether captured within early erotic paintings, film or contemporary AI models, patriarchal power structures influence the production, distribution, and consumption of visual media. Laura Mulvey’s concept the ‘Male Gaze’ provides an example of this. She critiques Hollywood cinema in perpetuating an asymmetrical dynamic where women exist as objects of visual pleasure; from the early inception of film, their bodies sexualised and fragmented. The camera, often controlled by male directors. This proves technology has not existed within a gender-neutral domain, and its use has reproduced gendered inequalities .

For the first time, real women and children are at the complete whim of men's imagination. The Internet Watch Foundation confirmed that content circulating included sexualized images of girls appearing to be around eleven years old. Grok generated an estimated 23,000 sexualized images of children in just eleven days. In one documented instance, Grok itself posted an AI-generated image of two girls aged 12-16 in sexualized attire, later stating it "deeply regretted" the incident.

Ambient Threat as a control mechanism

An extremely disturbing aspect of AI generated sexual abuse is its reification of contemporary patterns of gendered violence. Women’s bodies have long been sites for cultural and systemic policing, their autonomy undermined both in physical spaces (through harassment, assault, and discrimination) and in digital spaces (surveillance, and now AI generated sexual abuse). Grok’s AI model in generating thousands of sexualised images per day, dictates what is safe and unsafe for women’s online participation. This straightjackets their behavioural freedoms and visibility, out of fear their own bodies will be used against them. Just as the universally upheld concern for women’s safety at night in the offline world, the rise in digital gendered abuse disciplines women’s behaviour, voice, and agency in the offline world too.

Seen but not heard

AI generated sexual abuse produces a uniquely harrowing form of harm which incurs an alienation from survivors' sense of self. This concept aligns with Fredrickson and Roberts (1997) objectification theory. It illustrates how women are socialised to internalise an observer’s perspective of themselves, to behave under the standards of external surveillance. Margaret Atwood confirms this theory, claiming 'You are a woman with a man inside watching a woman. You are your own voyeur' (The Robber Bride, 1993). These concepts explore a sort of double consciousness women exist within, a split between grasping where the roots of their identity begin, and where projected expectations of the male voyeur have distorted their own behaviour, ambitions and ideals.

AI generated material bulldozes a chasm between the two. One body that’s lived in with memory, emotion and experience. Another body circulates, optimised and voyeuristically watched without input from the person who inhabits it. This irrevocably changes the nature of objectification. Objectification used to require an observer. Now it can be generated in seconds. Women bear the brunt of hollowly occupying their physical body, whilst duplicated bodies circulate the world without knowledge or consent

This creates a self-surveillance (Mulvey, 1975) which produces a persistent awareness of being perceived online, the threat of digital sexual violation forces women to sacrifice their freedom to simply exist. The constant preoccupation and fear of AI generated deepfakes, cuts women off from healthy engagement online and benefitting from public participation and discourse. We see this in offline control, coercion and the threat of sexual violence removing women in Afghanistan, from the public sphere. Threats of rape to silence female Palestinian political activists from protesting against Israeli occupation. To silence and discipline any woman that takes up 'too much space’ .

This was the case with Kylie Brewer, an openly queer feminist content creator and political activist, who faced waves of sexually explicit AI generated images of herself by her male critics, including requests to add bruises, bondage, blood and tears to images of her likeness. Someone compiled these images and created an OnlyFans account in her name, charging subscribers for access. "It was the most dejected that I've ever felt," Brewer said. This demonstrated Grok, wielded as a tool to silence and humiliate women from having a voice and engaging in public discourse. No voice. Seen and not heard.

Upon speaking up against this abuse, Dr Daisy Dixon and Jess Davies similarly both received heightened ‘backlash’ in the form of an upsurge of more aggressively graphic sexual images of themselves. This reinforces sexual coercion as a disciplining power against women, disproportionately abusing women of colour as frontline victims of harm. A 2021 UNESCO-commissioned study, The Chilling, found that while 64% of white women journalists reported experiencing online violence, that figure rose to 81% for Black women journalists and 86% for Indigenous women, evidence that race compounds exposure to gendered harm online.

Melissa Heikkilä, then a senior reporter at MIT Technology Review, documented this in 2022, not with Grok, but with the Lensa app. As a woman of Asian heritage, she was met with sexually explicit, anime-style caricatures of herself, jarringly distinguishable from her white colleagues' results, which generated beautified avatars. The issue lies in the sexualized nature of Asian women's representation in training data, which only amplifies harm. The pattern Heikkilä documented in 2022 reappears in every generative system trained on the same internet, Grok included.These design and data choices can reproduce racialized and gendered stereotypes present in training data. It leads us to question how the reproduced generation of these images will act as training data for future models, creating an infinite cycle of racialised harm. We must question our ideas of progression; the saying ‘history repeats itself’ is tame in the face of generative AI systems which regurgitate backward stereotypes, under the veil of technological objectivity.

An Unsafe World for us all …

Grok’s generative function normalises and makes sexual abuse widely accessible. Users who may not have previously considered this can easily play around and test Grok's limits as a light-hearted pastime. The sexualisation of women and girls becomes far easier. Set against a backdrop of rising Child sexual abuse material and paedophilia, the accessibility of generating sexual abuse is detrimental. Research has revealed one in nine men in the US admitted to online sexual offending against children (Childlight, 2025) 7% of male adults in the UK admitted the same. In both of these regions multiplied numbers claimed they would seek to commit physical sexual offences against children if they thought it would be kept secret (Childlight, 2025). The data point to how harmful existing online content already was for minors, young girls and women before Grok.

The consequences are not uniform. In Pakistan, where 62% of the population lives in rural areas and only 33% of women regularly access the internet, deepfakes weaponize existing cultural stigma around honour with devastating effect. Globally, 90% of deepfakes target women; in Pakistan, 70% of women already report feeling unsafe online. Shukria Ismail, a journalist from Khyber Pakhtunkhwa, was forced out of journalism entirely after fabricated sexual imagery was used against her. Another woman told researchers it was clear her abusers wanted to push her toward suicide or provoke an honour killing. Impunity is structural: only 92 cybercrime convictions were secured from 1,375 cases in 2023. Following historical patterns of gendered abuse, Black and Brown women bear the heaviest brunt of it all.

Can we truly be surprised? Is it really shocking that generations of patriarchal conditioning, dispossession of women’s voices and sexual objectification has bled into robbing women’s bodily autonomy online? The foundations of gendered control have been built, and the proof is in predictability.

What has been done?

One of the few legislative attempts to curb abuse is the Take It Down Act which gives platforms forty eight hours to remove explicit content after the victim files a report. This framework is deeply problematic. Forty eight hours. Let's think about this. Two days to process a report. Victims are responsible to find, report and track abusive material, an almost impossible task for women who don’t have the digital literacy, resources or simply the time to scour web pages searching for distorted images of themselves. Undetected harmful content remains online- allowing the cycle of harm to continue, oftentimes without the victim’s awareness. This deepens existing structures of domination which justify men’s entitlement to sexually violate- now increasingly recognised as a right to free speech within XAI’s market logic. The fact that women were the first line of fire, the first victims of abuse to this scale speaks to a deeper issue within tech governance: the acceptance of gendered harm as the inevitable cost of innovation.

It reveals governance’s commitment to the preservation of tech companies' sovereign right to ‘experiment’. While women are expected to quietly absorb the consequences. Women’s rights groups and online safety campaigners have consistently warned that weaker safeguards and permissive content policies would be exploited against women. Those warnings were dismissed as alarmist, and calls for stricter content moderation often disregarded as censorship. Systemically the issue is pronounced as a failure for companies to account for and mitigate harms against women. With the deployment of contemporary AI tools, accountability is shifted away from the platform and onto users who upload illegal content. Offloading and reframing the proliferation of abusive material as an individual’s misuse issue rather than a systemic failure to protect, only enables generated abusive material to quietly continue. Women are then forced to bear the burden of self regulation- which becomes impossible considering the abundance and rapidity of abuse generated.

Where do we go from here?

Structures which enabled sexual abuse were not deemed worth dismantling a billion dollar tech enterprise. XAI's choice to pursue Spicy Mode, rejecting thorough implementation of guardrails despite the foreseeable risk of sexualised harm to women reveals the fundamental priorities of governance models. Choosing profits over women's safety was a decision, not an oversight. In fact, historically sustained objectification of women and girls is what makes Grok’s instant generation of sexualised abuse so highly profitable and consumed. For example, in the US, women held 37% of computer science degrees in 1984. Decades of deliberate marketing to boys, discriminatory hiring practices, and hostile workplace culture drove that figure down to 18% by the early 2010s. Today it sits at around 21%, barely recovered despite years of pledges and programmes. The people designing these systems are overwhelmingly not the people they are designed to harm.

AI generated sexual content goes beyond the abuse of women’s bodies, necessitating an interrogation into the patriarchal dynamics which prioritise male interests over women’s safety embedded within tech design and our wider cultural contexts. The systemic design which enabled the proliferation of such harm, must be dismantled, challenged and rewired, centering the protection of women and girls in its revised design.

About the Authors

Asma Derja is the Founder of the Ethical AI Alliance, a global nonprofit working at the intersection of AI ethics, power, and justice. She authored Part 1.

Sara Albakri is a student at the University of Edinburgh and Student Ambassador at the Ethical AI Alliance. She authored Part 2.

Edited by Harshini Rajachander, writer, entrepreneur and community member of the Ethical AI Alliance.

Asma Derja